Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections

Compression techniques that support fast random access are a core component of any information system. Current stateof-the-art methods group documents into fixed-sized blocks and compress each block with a general-purpose adaptive algorithm such as gzip. Random access to a specific document then requires decompression of a block. The choice of block size is critical: it trades between compressi...

متن کامل

Relative Lempel-Ziv Compression of Genomes for Large-Scale Storage and Retrieval

Self-indexes – data structures that simultaneously provide fast search of and access to compressed text – are promising for genomic data but in their usual form are not able to exploit the high level of replication present in a collection of related genomes. Our ‘RLZ’ approach is to store a self-index for a base sequence and then compress every other sequence as an LZ77 encoding relative to the...

متن کامل

Lempel-Ziv Dimension for Lempel-Ziv Compression

This paper describes the Lempel-Ziv dimension (Hausdorff like dimension inspired in the LZ78 parsing), its fundamental properties and relation with Hausdorff dimension. It is shown that in the case of individual infinite sequences, the Lempel-Ziv dimension matches with the asymptotical Lempel-Ziv compression ratio. This fact is used to describe results on Lempel-Ziv compression in terms of dime...

متن کامل

Computing Reversed Lempel-Ziv Factorization Online

Kolpakov and Kucherov proposed a variant of the Lempel-Ziv factorization, called the reversed Lempel-Ziv (RLZ) factorization (Theoretical Computer Science, 410(51):5365–5373, 2009). In this paper, we present an on-line algorithm that computes the RLZ factorization of a given string w of length n in O(n log n) time using O(n log σ) bits of space, where σ ≤ n is the alphabet size. Also, we introd...

متن کامل

Lempel-Ziv factorization: Simple, fast, practical

For decades the Lempel-Ziv (LZ77) factorization has been a cornerstone of data compression and string processing algorithms, and uses for it are still being uncovered. For example, LZ77 is central to several recent text indexing data structures designed to search highly repetitive collections. However, in many applications computation of the factorization remains a bottleneck in practice. In th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the VLDB Endowment

سال: 2011

ISSN: 2150-8097

DOI: 10.14778/2078331.2078341